Estimating the epidemiology of chronic Hepatitis B Virus (HBV) infection in the UK: what do we know and what are we missing? [version 1; peer review: 1 approved, 2 approved with reservations]

Background: HBV is the leading global cause of cirrhosis and primary liver cancer. However, the UK HBV population has not been well characterised

prevalence of <0.5%. Discussion: Estimates varied by sources of error, bias and missingness, data linkage, and substantial "blind spots" in consistent testing and registration of HBV diagnoses. The HBV burden in the UK is likely to be concentrated in vulnerable populations who may not be well represented in existing datasets including those experiencing socioeconomic deprivation, ethnic minorities, people experiencing homelessness and people born in high-prevalence countries. Together, these factors could lead to either under-or over-estimation of overall prevalence, and additional efforts are required to provide estimates that best reflect the whole population. Multi-parameter evidence synthesis and back-calculation model methods similar to

Introduction
Hepatitis B virus (HBV) is the leading global cause of cirrhosis, and of primary liver cancer incidence and mortality 1,2 . Nearly 300 million individuals worldwide are estimated to be living with chronic HBV (CHB) infection. Risks of complications and death are mitigated by screening to detect cases of infection, clinical monitoring of chronic infection (including liver cancer surveillance in high-risk cases), and antiviral therapy in those who meet treatment criteria 3 .
The United Kingdom (UK) is regarded as a low-prevalence setting for CHB 3 . However, the attributable disease burden may be substantial in specific population subgroups including people who inject drugs, the prison population, people experiencing homelessness, and individuals belonging to minority ethnic groups and born in countries where the prevalence of CHB is higher 4 . Thus, CHB is concentrated in potentially vulnerable and/or disadvantaged population subgroups.
Epidemiological characterisation of the UK CHB population has been limited, with no central registry of infected persons. Existing data may primarily reflect new diagnoses (a combination of incident acute infection and new diagnoses of chronic infection), but caution is needed in making inferences about prevalence. Accurate estimation of prevalence is challenging, because complete HBV data are not likely to be well captured by large-scale electronic health record (EHR) databases for either primary or secondary care 5 , as many CHB cases remain untested and therefore undiagnosed.
The World Health Organization (WHO) has set targets for viral hepatitis elimination within its Sustainable Development Goals for 2030. The Global Health Sector Strategy on Viral Hepatitis 6 identifies specific goals, including diagnosis in 90% of chronic infections, 90% reduction in incidence of chronic infection, and 80% treatment coverage in those eligible. High quality epidemiological data are therefore crucial to focus and measure progress, inform policy and interventions, reduce inequities and underpin resource allocation. We herein summarise datasets that are available to represent UK CHB epidemiology, consider differences between sources, and discuss deficiencies in current estimates.

Methods
We searched for estimates of CHB case numbers in the UK (incorporating incidence and/or prevalence-like data) across a range of available sources. We included UK-wide reports from government bodies, publications from independent bodies (including medical charities and non-governmental organisations) and articles in peer-reviewed scientific journals. We present positivity rates from each respective data source, but caution that these estimates are not representative of the true UK-wide population prevalence. Details of study samples/denominator are provided. The Office for National Statistics (ONS) provides UK population estimates as a point of reference for the overall denominator 7 .
We also utilised data from the UK primary care database QResearch, which contains over 35 million patient records from more than 1800 individual practices 7  We identified individuals in the QResearch (version 44) database who had a record of a diagnostic Systemised Nomenclature of Medicine (SNOMED)/Read or International Classification of Disease (ICD) code indicative of CHB, or who had a history of ≥1 hepatitis B surface antigen (HBsAg) or viral load (VL) measurement. From this sample we identified individuals between 01 January 1999 and 31 December 2019, age ≥18 years with CHB, defined as: i) record of a diagnostic SNOMED/Read code indicating CHB; or ii) record of a diagnostic ICD-9 or -10 code indicating CHB; and/or iii) Presence of HBsAg or VL on ≥2 recordings ≥6 months apart. The characteristics of HBV infection in the cases we identified are further described elsewhere 7 .
We have also drawn on findings from a similar investigation previously undertaken in the Clinical Practice Research Datalink (CPRD) 8 , which is another UK primary care database containing EHRs for over 16 million patients. This previous investigation identified CHB individuals from patients registered in the database between 2000 and 2015.
This article can be found on medRxiv 9 .

Results and discussion
UK data for CHB epidemiology are summarised in Table 1.
Three of six estimates report information concerning population demographics, including number of infected individuals across age, sex and ethnicity categories. Among sources setting out to report prevalence, estimates varied from 0.27% (British Liver Trust / Department of Health and Social Care (DHSC) 2002 estimate) to 0.73% (estimate by the Polaris Institute). An alterantive proxy for population prevalence is obtained via the UK antenatal screening programme, which achieves over 95% coverage of every pregnant woman annually (approx. 700,000 women in the UK), with a CHB prevalence of <0.5% 10 . Differences between sources highlights varied sources of error, bias and missingness, problems with data linkage, and substantial "blind spots" in consistent testing and registration of HBV diagnoses.
As HBV is a notifiable disease in the UK, the UK Health Security Agency, UKHSA (previously Public Health England, PHE), has a comprehensive surveillance system for monitoring burden of CHB, by monitoring testing, blood donor screening and diagnoses across the care pathway. This incorporates data from diagnoses through to outcomes, (including end-stage liver disease, transplantation, liver cancer and deaths) using laboratory testing surveillance, hospital activity datasets and registries (sentinel surveillance of blood-borne virus (BBV) testing, new laboratory diagnoses, hospital episode statistics, ONS cancer and deaths registries). However, these data have not yet been combined and incorporated in a statistical model to estimate prevalence. Sentinel surveillance captures testing in community, primary care and secondary care settings across a network of laboratories covering approximately 40% of the general population of England 11 . This likely gives the best estimate of diagnosed prevalence among a tested population, but because it combines acute incident infections and new diagnoses of pre-existing chronic infection, incidence and prevalence cannot be disaggregated.
The majority of diagnostic data are generated through testing individuals with risk factors for HBV infection or evidence of liver disease, and are therefore at risk of over-estimating true prevalence. However, no existing estimates factor in the undiagnosed burden, which represents the majority of people living with HBV infection (the WHO estimates that only 10.5% of people with CHB are aware of their infection status 3 ). Furthermore, the highest prevalence of CHB is in groups for whom provision of healthcare is inadequate, and/or access to healthcare is challenging (including migrants, sex-workers, prisoners, and people experiencing homelessness), so overall there are still many gaps in the data, and it is most likely that estimates using primary care datasets considerably underestimate the true burden. In contrast, prevalence or test positivity among those accepting risk-based testing (as captured in laboratory testing surveillance) likely overestimates the overall population prevalence.
While UKHSA surveillance data may include some demographic characteristics (age, sex, postcode for deprivation), unless linked to other healthcare datasets, they typically lack more detailed clinical and demographic indicators (for example, measures of deprivation, lifestyle factors, assessment of liver disease, and HBV treatment coverage) which are needed to characterise the infected population. In constrast, EHR databases (such as CPRD and QResearch) have the advantage of collecting relevant demographic and clinical metadata which are not captured by UKHSA. However, linkage across data sources is disaggregated, and thereby each EHR-based estimate misses a portion of the infected population. For example, primary care data may not reflect testing conducted in secondary care 22 , blood safety (transfusion/transplantation) and laboratory data generated by other services, while secondary care data are typically only reliable for the sub-population enrolled in consistent hospital follow-up. Poor data flow between diagnostic testing and EHR reflect a low clinical follow-up rate following a positive HBsAg test. This limited linkage to care reflects how services may not provide well for the CHB population, with gaps in referral pathways, inadequate communication and education (including translation services), and failures to deliver services to marginalised communities. Therefore, EHR databases offer the potential to characterise a subset of those infected with HBV, but do not currently generate a picture that is generalisable to the wider infected population, and cannot on their own be used to estimate prevalence.
Prevalence estimates for Hepatitis C virus (HCV) 23 and human immunodeficiency virus (HIV) 24 have recently been generated using multi-parameter evidence synthesis and back-calculation models. Similar modelling approaches to produce estimates of HBV incidence and prevalence in the UK are warranted.
Enhanced investment is needed to support the establishment of national registries with robust centralised data linkage between sources including national laboratory surveillance systems of BBV testing and new diagnoses, and thus determine which population subgroups are bearing the majority of the HBV disease burden. This will inform prevalence modelling and provide an evidence base for delivery of appropriate resources and interventions, and to benchmark progress towards elimination targets.

Summary box: Recommendations for the generation of enhanced insights into national CHB caseload
• Expansion of systematic screening, including opportunistic approaches (sexual health, antenatal, emergency medicine, people born in high-prevalence settings).
• Improved centralised data linkage between services, including laboratory records, blood and transplant services, primary and secondary care, supported by collection of metadata.
• Disaggregation of incidence/prevalence data where possible at source.
• Establishment of regional and/or national registries to collate linked data for HBV infection at a population level and within high risk groups.
• Mathematical modelling to optimise use of existing data to generate incidence and prevalence estimates, identify systematic data gaps, refine allocation of resources and predict progress towards elimination targets.

Ethics approval
QResearch ethics approval is with East Midlands-Derby Research Ethics Committee (reference 18/EM/0400).

Data availability
Only CC, TW, RB and JH-C have access to the QResearch individual-level patient data in order to ensure confidentiality of personal and health information, in accordance with the relevant licence agreements. QReseearch data access is according to the information on the QResearch website (www.qresearch.org). I appreciate that this is a narrative review, yet the information on what was searched (i.e., which keywords), in which locations (i.e., with which search engines/sources) and when (i.e., years in which databases were allowed) is unclear. It would also be helpful to include some notion of which databases were included and excluded. For instance, it seems that many small epidemiological studies intended to find prevalence or incidence in very specific demographic populations could be missed (which could help address some of the gaps stated by the authors). Including this information would help guide the reader as to how the authors arrived at their selection.

Author contributions
I also very much appreciate the summary box, which includes recommendations on how to achieve national CHB caseloads. But given the low prevalence of HBV and higher prevalence in specific key populations, would it make sense to aim for a national CHB caseload estimate? The aim of these recommendations should more reflect what the authors stated in their title, towards understanding the epidemiology of HBV in the United Kingdom. It would also be helpful to include a column in this box with the gap(s) in knowledge corresponding to the given recommendation, so the reader can identify the limitations of existing data (and hopefully avoid it in the future).
Finally, it is unclear why prevalence estimates were given in the results, or even why this was examined in the first place if the aim was to discuss data sources to understand HBV epidemiology (i.e., data sources that have an estimate of HBV prevalence). I would suggest that the authors make it clear why this statistic was included, possibly by adding it as a secondary objective.
-Abstract. "sources of error" refers to what exactly? This term could be deleted as it seems that bias was more the focal point of the discussion.
-Abstract. Although I agree with the sentence, "Multi-parameter evidence … may be applicable to HBV." -this does not really seem appropriate for the abstract (as the result backing this argument is missing in the Abstract).
-Introduction. The information on ERH is probably more appropriate for the discussion. The third paragraph in the introduction should focus not only on the limited data, but also the lack of summary on HBV data in the UK. This information could be used to help policy makers understand what exactly are the gaps in knowledge.
-Introduction. It should be stated clearly in the aims that this is a narrative review (for readers who may confuse this with a systematic review and meta-analysis).

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate? Not applicable © 2023 Choi J. This is an open access peer review report distributed under the terms of the Creative Commons © 2022 Mak L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Lung-Yi Mak
Department of Medicine, Queen Mary Hospital, The University of Hong Kong, Hong Kong, China This is an extremely important piece of work to estimate the epidemiology of HBsAg+ seropositivity in the UK. The authors summarized data from 6 sources, leading to a crude estimate for HBsAg seropositivity ranging from 0.27-0.73%. The following points should be addressed to further enrich the scientific contents and provide some directions for further research: 1. Ongoing pilot programs of universal HBV testing at Emergency Departments are being conducted in some NHS Trusts, especially after the COVID-19 pandemic. Would there be any chance that updated data from these pilot programs might have been recently reported and can be included in this piece? 2. Regional differences within the country, or even to a smaller scale, within a city, should also be appreciated. For instance, East London is likely having a much higher prevalence of HBsAg+ than the West/ North side of the city. While population-based data is very important, targeting high-risk groups (as the authors also addressed) is equally crucial to inform healthcare strategies such as resource allocation. This is also relevant to UK in view of the low overall prevalence of HBsAg (<2%, according to data presented), and population-based screening may not be as cost-effective as other highly endemic regions. Although the CDC has updated the recommendations since 2022 to screen for HBV infection for at least once in the lifetime for all adults, the actual implementation of such approach will highly depend on the resources available. Therefore, it would be helpful if there is data for HBsAg seroprevalence in the high-risk groups; and in addition to that addressed in this article (sexual health, antenatal, emergency medicine, people born in high-prevalence setting), regions that are traditionally considered to be impoverished areas or known to harbour a relatively high proportion of immigrants, should be the target groups for such opportunistic approaches.
3. Would there be any overlap between the laboratory-identified HBsAg+ cases from UKHSA with other data sources? (i.e., the primary care database QResearch, CPRD primary care database and the antenatal screening programme) 4. Risk of bias and sampling error is well acknowledged -the authors may also include data from the NHS Blood and Transplant study published in Transfusion 2021 which reported a seroprevalence of HBsAg+ being 6.9/ 100,000 donors between 2009-2018.

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?